tree classifier
A Novel Ensemble Learning Approach for Enhanced IoT Attack Detection: Redefining Security Paradigms in Connected Systems
Abdeljaber, Hikmat A. M., Hossain, Md. Alamgir, Ahmad, Sultan, Alsanad, Ahmed, Haque, Md Alimul, Jha, Sudan, Nazeer, Jabeen
The rapid expansion of Internet of Things (IoT) devices has transformed industries and daily life by enabling widespread connectivity and data exchange. However, this increased interconnection has introduced serious security vulnerabilities, making IoT systems more exposed to sophisticated cyber attacks. This study presents a novel ensemble learning architecture designed to improve IoT attack detection. The proposed approach applies advanced machine learning techniques, specifically the Extra Trees Classifier, along with thorough preprocessing and hyperparameter optimization. It is evaluated on several benchmark datasets including CICIoT2023, IoTID20, BotNeTIoT L01, ToN IoT, N BaIoT, and BoT IoT. The results show excellent performance, achieving high recall, accuracy, and precision with very low error rates. These outcomes demonstrate the model efficiency and superiority compared to existing approaches, providing an effective and scalable method for securing IoT environments. This research establishes a solid foundation for future progress in protecting connected devices from evolving cyber threats.
- Asia > Singapore (0.04)
- Asia > Nepal > Bagmati Province > Kathmandu District > Kathmandu (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- (7 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Context-Specific Refinements of Bayesian Network Classifiers
Leonelli, Manuele, Varando, Gherardo
Supervised classification is one of the most ubiquitous tasks in machine learning. Generative classifiers based on Bayesian networks are often used because of their interpretability and competitive accuracy. The widely used naive and TAN classifiers are specific instances of Bayesian network classifiers with a constrained underlying graph. This paper introduces novel classes of generative classifiers extending TAN and other famous types of Bayesian network classifiers. Our approach is based on staged tree models, which extend Bayesian networks by allowing for complex, context-specific patterns of dependence. We formally study the relationship between our novel classes of classifiers and Bayesian networks. We introduce and implement data-driven learning routines for our models and investigate their accuracy in an extensive computational study. The study demonstrates that models embedding asymmetric information can enhance classification accuracy.
- North America > United States > Wisconsin (0.05)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Machine Learning Classification of Alzheimer's Disease Stages Using Cerebrospinal Fluid Biomarkers Alone
Tiwari, Vivek Kumar, Indic, Premananda, Tabassum, Shawana
Early diagnosis of Alzheimer's disease is a challenge because the existing methodologies do not identify the patients in their preclinical stage, which can last up to a decade prior to the onset of clinical symptoms. Several research studies demonstrate the potential of cerebrospinal fluid biomarkers, amyloid beta 1-42, T-tau, and P-tau, in early diagnosis of Alzheimer's disease stages. In this work, we used machine learning models to classify different stages of Alzheimer's disease based on the cerebrospinal fluid biomarker levels alone. An electronic health record of patients from the National Alzheimer's Coordinating Centre database was analyzed and the patients were subdivided based on mini-mental state scores and clinical dementia ratings. Statistical and correlation analyses were performed to identify significant differences between the Alzheimer's stages. Afterward, machine learning classifiers including K-Nearest Neighbors, Ensemble Boosted Tree, Ensemble Bagged Tree, Support Vector Machine, Logistic Regression, and Naïve Bayes classifiers were employed to classify the Alzheimer's disease stages. The results demonstrate that Ensemble Boosted Tree (84.4%) and Logistic Regression (73.4%) provide the highest accuracy for binary classification, while Ensemble Bagged Tree (75.4%) demonstrates better accuracy for multiclassification. The findings from this research are expected to help clinicians in making an informed decision regarding the early diagnosis of Alzheimer's from the cerebrospinal fluid biomarkers alone, monitoring of the disease progression, and implementation of appropriate intervention measures.
- North America > United States > Texas > Smith County > Tyler (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Predicting environment effects on breast cancer by implementing machine learning
Farooq, Muhammad Shoaib, Ilyas, Mehreen
The biggest Breast cancer is increasingly a major factor in female fatalities, overtaking heart disease. While genetic factors are important in the growth of breast cancer, new research indicates that environmental factors also play a substantial role in its occurrence and progression. The literature on the various environmental factors that may affect breast cancer risk, incidence, and outcomes is thoroughly reviewed in this study report. The study starts by looking at how lifestyle decisions, such as eating habits, exercise routines, and alcohol consumption, may affect hormonal imbalances and inflammation, two important factors driving the development of breast cancer. Additionally, it explores the part played by environmental contaminants such pesticides, endocrine-disrupting chemicals (EDCs), and industrial emissions, all of which have been linked to a higher risk of developing breast cancer due to their interference with hormone signaling and DNA damage. Algorithms for machine learning are used to express predictions. Logistic Regression, Random Forest, KNN Algorithm, SVC and extra tree classifier. Metrics including the confusion matrix correlation coefficient, F1-score, Precision, Recall, and ROC curve were used to evaluate the models. The best accuracy among all the classifiers is Random Forest with 0.91% accuracy and ROC curve 0.901% of Logistic Regression. The accuracy of the multiple algorithms for machine learning utilized in this research was good, which is important and indicates that these techniques could serve as replacement forecasting techniques in breast cancer survival analysis, notably in the Asia region.
- Asia > Singapore (0.04)
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models
Talukder, Md. Simul Hasan, Sulaiman, Rejwan Bin
Epilepsy is a prevalent neurological disorder characterized by recurrent and unpredictable seizures, necessitating accurate prediction for effective management and patient care. Application of machine learning (ML) on electroencephalogram (EEG) recordings, along with its ability to provide valuable insights into brain activity during seizures, is able to make accurate and robust seizure prediction an indispensable component in relevant studies. In this research, we present a comprehensive comparative analysis of five machine learning models - Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Logistic Regression (LR), and Gradient Boosting (GB) - for the prediction of epileptic seizures using EEG data. The dataset underwent meticulous preprocessing, including cleaning, normalization, outlier handling, and oversampling, ensuring data quality and facilitating accurate model training. These preprocessing techniques played a crucial role in enhancing the models' performance. The results of our analysis demonstrate the performance of each model in terms of accuracy. The LR classifier achieved an accuracy of 56.95%, while GB and DT both attained 97.17% accuracy. RT achieved a higher accuracy of 98.99%, while the ET model exhibited the best performance with an accuracy of 99.29%. Our findings reveal that the ET model outperformed not only the other models in the comparative analysis but also surpassed the state-of-the-art results from previous research. The superior performance of the ET model makes it a compelling choice for accurate and robust epileptic seizure prediction using EEG data.
- Asia > Singapore (0.04)
- Asia > Bangladesh (0.04)
- Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)
- Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches
Sengupta, Prajit, Mehta, Anant, Rana, Prashant Singh
Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.
- Asia > India (0.05)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Government > Military (1.00)
- Health & Medicine (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)
- (2 more...)
Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces
Mazurek, Szymon, Wielgosz, Maciej
In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels, we illustrate the profound impact of dataset quality on model training and performance. We further introduce two additional datasets designed to represent specific data conditions - one maximizing entropy and the other demonstrating high redundancy. Our findings underscore the importance of appropriate feature selection, adequate data volume, and data quality in achieving high-performing machine learning models. To aid researchers and practitioners, we propose a comprehensive framework for dataset quality assessment, which can help evaluate if the dataset at hand is sufficient and of the required quality for specific tasks. This research offers valuable insights into data assessment practices, contributing to the development of more accurate and robust machine learning models.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
A new class of generative classifiers based on staged tree models
Carli, Federico, Leonelli, Manuele, Varando, Gherardo
Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with those of state-of-the-art classifiers. An applied analysis to predict the fate of the passengers of the Titanic highlights the insights that the new class of generative classifiers can give.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Italy (0.04)
- Health & Medicine (0.46)
- Transportation (0.34)
Adversarial Edit Attacks for Tree Data
Many machine learning models can be attacked with adversarial examples, i.e. inputs close to correctly classified examples that are classified incorrectly. However, most research on adversarial attacks to date is limited to vectorial data, in particular image data. In this contribution, we extend the field by introducing adversarial edit attacks for tree-structured data with potential applications in medicine and automated program analysis. Our approach solely relies on the tree edit distance and a logarithmic number of black-box queries to the attacked classifier without any need for gradient information. We evaluate our approach on two programming and two biomedical data sets and show that many established tree classifiers, like tree-kernel-SVMs and recursive neural networks, can be attacked effectively.
Candidates v.s. Noises Estimation for Large Multi-Class Classification Problem
This paper proposes a method for multi-class classification problems, where the number of classes $K$ is large. The method, referred to as {\em Candidates v.s. Noises Estimation} (CANE), selects a small subset of candidate classes and samples the remaining classes. We show that CANE is always consistent and computationally efficient. Moreover, the resulting estimator has low statistical variance approaching that of the maximum likelihood estimator, when the observed label belongs to the selected candidates with high probability. In practice, we use a tree structure with leaves as classes to promote fast beam search for candidate selection. We also apply the CANE method to estimate word probabilities in neural language models. Experiments show that CANE achieves better prediction accuracy over the Noise-Contrastive Estimation (NCE), its variants and a number of the state-of-the-art tree classifiers, while it gains significant speedup compared to the standard $\mathcal{O}(K)$ methods.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)